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The Ontario Ministry of Education recently implemented the Steps to English 
Proficiency (STEP) language assessment framework to build educator capacity 
for addressing the needs of English language learners (ELLs) in K-12 schools. 
The STEP framework is a set of descriptors-based language proficiency scales that 
specify observable linguistic behaviours from which educators can make inferences 
about students' English language development. Teachers use these proficiency 
scales to assess, document, and track students' language proficiency development 
based on daily interactions with students in classrooms. The purpose of this article 
is to report on teachers' perceptions of and experiences with the STEP proficiency 
scales during a three-year pilot implementation and validation study of the initia¬ 
tive. Based on analysis of these findings, we articulate implications for building 
teachers' assessment capacity using observational language assessment scales for 
K-12 ELLs. 

Le Ministere de VEducation de I'Ontario a recemment mis sur pied un cadre 
devaluation des competences linguistiques (Steps to English Proficiency - 
STEP) pour accroitre la capacite des enseignants a repondre aux besoins des 
apprenants d’anglais dans les ecoles K-12. Le cadre STEP est un ensemble 
d'echelles de competences linguistiques basees sur des descripteurs qui decrivent 
des comportements linguistiques a partir desquels les enseignants peuvent faire 
des inferences quant au developpement des eleves en anglais. Les enseignants se 
servent de ces echelles de competence pour evaluer, documenter et suivre le deve¬ 
loppement langagier de leurs eleves dans leurs interactions quotidiennes avec 
les eleves en classe. L'objectifde cet article est de faire etat des perceptions et des 
experiences des enseignants relatives aux echelles de competences STEP pendant 
les trois annees de la phase de mise en oeuvre initiate et d'etude de la validite de 
I'initiative. Nous nous appuyons sur les resultats de notre analyse pour formu- 
ler des implications relatives a I'accroissement de la capacite des enseignants par 
I'emploi des echelles de competences linguistiques aupres d'apprenants d’anglais 
en K-12. 
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English-medium schools in the province of Ontario have a large number of 
students learning the language of instruction while at the same time learn¬ 
ing content curriculum. For example, among the 2.1 million students in the 
province, 27% were born outside of Canada, and a great proportion of these 
students speak a language other than English or French at home—the two 
official languages of Canada (Gallagher, 2014). Further, educators are increas¬ 
ingly recognizing the language learning needs of Canadian-born children 
for whom English is an additional language. Both immigrant and domestic 
English language learners (ELLs) face the challenge of learning academic sub¬ 
jects in a new language that is necessary to set them on a positive trajectory 
for curriculum learning and academic success. Over the past 10 years, the 
Ontario Ministry of Education has developed policies and resources to build 
educator capacity for addressing these students' learning needs. Among 
these resources is the Steps to English Proficiency (STEP) language assess¬ 
ment framework. The STEP framework primarily consists of grade-specific, 
descriptors-based proficiency continua. The STEP continua specify observ¬ 
able linguistic behaviours from which educators can make inferences about 
students' English language development. Teachers use these proficiency 
scales to assess, document, and track students' language proficiency devel¬ 
opment based on daily interactions with students in classrooms. The purpose 
of this article is to report on teachers' perceptions of and experiences with the 
STEP proficiency scales during a three-year pilot implementation and valida¬ 
tion study of the initiative. Based on analysis of these findings, we articulate 
implications for building teachers' assessment capacity using observational 
language assessment scales for K-12 ELLs. 

Support for K-12 English Language Learners in Ontario Schools 

In Canada, education is provincially mandated; therefore, each province is 
responsible for the design and implementation of initiatives to support and 
guide learning in curricular contexts. Ontario is one of Canada's largest prov¬ 
inces and is its most populated province (Statistics Canada, 2014). As a conse¬ 
quence, there are many differences in how Ontario schools provide support 
for English Language Learners (ELLs) due to demographics and the diver¬ 
sity of board-level priorities across the province. For instance, school boards 
with a large population of ELLs may have schools with one or more full-time 
English as a Second Language (ESL) teacher(s), whereas school boards with 
a smaller population may have itinerant ESL teachers who provide support 
to a family of schools. Similarly, some students have the opportunity to at¬ 
tend their local school and receive ESL support, whereas others may need 
to be transported by bus to a hub school that provides ESL support within 
the district. Language teaching and learning support provided to individual 
students in Ontario schools depends on both the learner's level of English lan¬ 
guage proficiency and school resources. At the elementary level, students at 
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beginning stages of proficiency are often placed in self-contained ESL classes 
for part of the school day and integrated into mainstream classes for the re¬ 
mainder of the day. Students at higher levels of proficiency tend to be fully 
integrated into mainstream classes, with monitoring and support provided 
by an ESL teacher working in collaboration with the classroom teacher. At the 
secondary level, students can take up to five ESL credit courses in place of 
mainstream English courses, with the possibility of adding non-ESL English 
courses when they are prepared to do so. Some secondary schools also offer 
locally developed subject-area courses such as ESL history and science for 
ELLs depending on the level of demand and the resources available. 

School boards, and in some cases individual schools, have considerable 
discretion concerning the delivery of ESL programs, leading to different levels 
of support being provided to students with similar needs (Auditor General of 
Ontario, 2005). Furthermore, there has been no consistent mode of assessing 
and tracking the developmental trajectories of ELLs in Ontario schools, nor 
has there been a consistent and common language for discussing students' 
progression across grades, schools, and districts. In 2005, the Auditor General 
of Ontario issued a report calling for a more consistent approach to meeting 
the needs of ELLs and both ESL and mainstream teachers in the province. 
The Ministry initiated the development of STEP as a means to achieve these 
purposes, providing a resource for ESL and mainstream classroom teachers 
and language assessors to use to identify, describe, and monitor students' 
English language proficiency development. Further, STEP was developed to 
build capacity for 

• determining student placement; 

• supporting planning and programming decisions; 

• implementing differentiated instruction and assessment; 

• selecting appropriate teaching and learning resources; 

• making decisions regarding student participation in and support for 
large-scale assessment; 

• engaging students in self-assessment and goal setting; 

• identifying possible special learning needs; 

• providing students and parents with accurate indications of the child's 
level of English language acquisition and literacy development; 

• determining discontinuation of ESL support; 

• promoting reflective teacher practice; 

• providing an opportunity to focus teacher reflection and professional dia¬ 
logue (Ontario Ministry of Education, 2012, p. 3) 


STEP Language Proficiency Descriptor Scales 

The STEP framework is a set of descriptors-based language proficiency scales 
to be used by teachers based on their daily interactions with students in their 
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classes. The framework includes four sets of scales, one for each grade cluster: 
primary (grades 1-3), junior (grades 4-6), intermediate (grades 7-8), and sec¬ 
ondary (grades 9-12). The STEP scales comprise descriptors targeting three 
areas of language use: Oral Communication, Reading and Responding, and 
Writing based on the ESL curriculum (please refer to the Ontario Ministry 
of Education 2007 curriculum documents). Separate scales are provided for 
students in English Literacy Development (ELD) programs, reflecting the fact 
that the latter group has experienced gaps in their educational background 
and/or opportunities for literacy development in their home language. 1 The 
descriptors in each scale articulate a cumulative progression through increas¬ 
ingly complex forms of communicative competence, comprising six "steps" 
of language proficiency development. Sample descriptors for each of the 
three language modalities, and across six STEPs for students in Grades 1 to 
3, are shown in Table 1. 

Designed specifically for use in the Ontario educational context, the de¬ 
scriptors focus on linguistic performances that are observable by teachers 
during curriculum learning tasks. In Table 1, the column containing "Ele¬ 
ments" illustrates the connection between the curricular focus (or element) 
and the linguistic behaviours that may be observed over time as students 
learn content. Teachers' classroom-based tasks provide a context and a point 
of reference for observing and gathering evidence to assess learners' language 
proficiency development. For these reasons, the descriptors relate to and ar¬ 
ticulate the communicative demands of Ontario curriculum. Our validation 
research (Jang, Cummins, Wagner, Stille, & Dunlop, 2015; Jang, Wagner, & 
Stille, 2011) provided evidence that these descriptors act as generally dis¬ 
tinct stages, indicating the stability of the scales in distinguishing among six 
proficiency levels. Additionally, the research demonstrated that average dif¬ 
ficulty increases when moving from Step 1 to Step 6, and that the scales reflect 
the developmental nature of language and curriculum learning across grade 
clusters (e.g.. Step 1-2 for Grade 1, Step 3-4 for Grade 2, etc.). 

Language Proficiency Descriptor Scales in the Educational 
Context 

Development of language proficiency descriptor scales emerged both from 
a need to describe what students can do at various levels of proficiency de¬ 
velopment, and from the standards-based movement in the United States 
wherein standards articulate expectations of what students should know and 
be able to do (Bailey & Huang, 2011). Descriptors-based language proficiency 
scales are part of a global assessment movement, which use these frameworks 
to assess students' language proficiency development. For example, the de¬ 
scriptors-based Common European Framework of Reference (CEFR) for Lan¬ 
guages (Council of Europe, 2001) has been translated into many European 
languages and has been widely implemented in a variety of textbooks, cur- 
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Table 1 

Sample Primary Level (Grades 1-3) Descriptors Across Six STEPs (Ontario Ministry of Education, 2012a) 
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ricula, and examinations across Europe (Alderson, 2005). Other descriptors- 
based proficiency assessments include the Canadian Language Benchmarks 
(CLB), the American Council of the Teaching of Foreign Languages (ACTFL), 
and World-Class Instructional Design and Assessment (WIDA) English Lan¬ 
guage Proficiency (Jang et al., 2011). STEP is distinguished from these scales 
as it is used to track and assess school-aged learners' language proficiency 
development. 

Much scholarly research has focused on development of descriptor scales, 
illustrating both the challenges in their development and the potential for 
these scales to describe language proficiency development in a way that is 
useful to teachers (Byrnes, 2008; Davison, 2007; McKay, 2000; North, 1993; 
North & Schneider, 1998; Scott, 2009; Scott & Erduran, 2004). Rather less at¬ 
tention has been paid to how these descriptors-based proficiency scales have 
influenced teacher instruction and assessment practice or to their useful¬ 
ness for students. Several recent studies have focused on highlighting the 
diagnostic and formative purposes of descriptors-based proficiency scales 
for language teaching and learning (Brindley, 1998; Colby-Kelly & Turner, 
2007; Davison, 2004; Hamp-Lyons, 2007; Rea-Dickens, 2004, 2007; Teasdale 
& Leung, 2000). 

Language proficiency descriptor scales such as the CEFR (Council of Eu¬ 
rope, 2001), the WIDA English Language Proficiency Standards and Assess¬ 
ing Comprehension and Communication in English State to State (ACCESS) 
assessment (Kenyon, MacGregor, Li, & Cook, 2011; WIDA, 2012), the ESL 
Standards for Pre-K-12 students in the United States (Teachers of English to 
Speakers of Other Languages, Inc., 2006), the ACCLES in Hong Kong (Da¬ 
vison, 2004), and the National Languages and Literacy Institute of Australia 
(NLLIA) ESL Bandscales (McKay, 2007; Scott & Erduran, 2004) have been 
used to assist teachers in understanding and interpreting learners' language 
proficiency development. The development and use of these scales have high¬ 
lighted issues and challenges arising out of the intersection between language 
assessment and classroom curricula and pedagogy. In particular, research has 
drawn attention to issues in construct definition and the challenges in opera¬ 
tionalizing language proficiency development in the context of content-area 
learning (Davison & Leung, 2009; Little, 2010; McKay, 2000; Scott, 2009). For 
instance, higher levels in second-language assessment scales tend to reflect 
learners' cognitive skills and educational experiences (Hulstijn, 2011), making 
it difficult for teachers to distinguish students' language proficiency develop¬ 
ment from higher-order thinking skills. Similarly, alignment with curricu¬ 
lum can make it difficult to reliably distinguish students' English language 
proficiency levels, and teachers may have difficulty distinguishing students' 
language proficiency development from their subject knowledge (Jang et al., 
2015). Moreover, as social and socially situated activities, classroom-based 
assessments are not wholly reflective of individual cognitive processes, but 
also reflect social, affective, and academic circumstances and learners' in- 
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structional learning experiences. Language acquisition and language use in 
the classroom can therefore be seen as a unified process (Lantolf, 2009). 

Along with increased use of descriptors-based language proficiency 
scales for assessment, language testing literature has called for research into 
classroom-based language assessment (CBLA) to explore the relationship 
between standardized testing and observation-driven assessments of lan¬ 
guage proficiency (Leung & Mohan, 2004; McNamara, 2001; Rea-Dickens, 
2004; Shohamy, 2004). Unlike normative standardized tests, which measure 
learners' language proficiency development at one point in time against a 
predetermined population norm, CBLA takes place during everyday teach¬ 
ing and learning activities, allowing educators to evaluate learners' growth 
in language competencies over time and on multiple occasions, using diverse 
modes and forms of communicative interactions in the classroom. Examining 
and reflecting upon this evidence of learners' communicative performances, 
educators can then use language proficiency descriptor scales to judge learn¬ 
ers' proficiency levels and identify areas of future learning. Situated within 
the classroom and embedded in educators' ongoing instructional practice, 
CBLA promotes authentic assessment in a naturally occurring language 
learning context (Brindley, 2001; Chalhoub-Deville, 2003; Scott, 2009; Shepard, 
2002; Wigglesworth, 2008). This congruence between learning, teaching, and 
assessment corroborates a degree of ecological validity in that students are 
assessed in the way they have been taught and within the context of their 
language use (Whitehead, 2007). 

Teachers' Use of Descriptors-Based Language Proficiency Scales 

The use of descriptors-based proficiency scales in the educational context has 
led to an interest in challenges that teachers face in performing classroom- 
based language assessment. Studies exploring teachers' assessment practices 
and their use of proficiency scales have demonstrated teachers' needs associ¬ 
ated with implementing effective language assessment in their classrooms, 
including the need for clear and interpretable proficiency scales (Davison & 
Leung, 2009; Llosa, 2011; Rea-Dickens, 2004). Research has also documented 
several challenges, including variability in teachers' assessments based on 
their views of assessment criteria (Butler, 2009), their perceptions of student 
motivation (Butler, 2009), their assessment literacy (Fulcher, 2012; Inbar- 
Lourie, 2013; Malone, 2013; Taylor, 2009, 2013), and the complex nature of 
the classroom context (Brindley, 1998). For instance, investigating teachers' 
language assessment practices, Davison (2004) described several of these 
challenges. First, assessment criteria are interpreted differently by teachers 
according to their personal background, previous experience, and expecta¬ 
tions regarding the relative importance and meaning of language assessment 
criteria, causing teachers to differ from one another in their interpretation of, 
response to, and use of these criteria. Second, teacher-based assessment is not 
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a technical activity: it relies upon teachers' professional judgement and inter¬ 
pretation. Although descriptor scales may provide a basis for understanding 
learners' linguistic development, teachers understand and interpret these 
criteria in relation to their experience with real students and their linguistic 
performances during everyday classroom activities. Third, teachers' interpre¬ 
tation, negotiation, and discussion of their assessment decisions contribute 
to the validity and reliability of teachers' judgements. Finally, teachers may 
vary in the extent to which they will accept externally imposed criteria as a 
basis for their professional judgement about learners' language development. 
These multiple challenges are compounded by the diversity and variability 
of the contexts in which they are used. 

Much of the use of descriptors-based proficiency scales depends upon 
the role of teachers. Teachers' roles and responsibilities often include, for in¬ 
stance, planning assessment activities, collecting samples of student work, 
and interpreting and making judgements about students' linguistic perfor¬ 
mances; monitoring, adapting, and modifying assessment depending on 
teaching and learning goals; and giving immediate and constructive feed¬ 
back to students (Davison & Leung, 2009). Considering the interactionalist 
perspective that communicative language ability is the result of interactions 
between intrapersonal linguistic traits and the situational context (Chalhoub- 
Deville, 2003; Chapelle, 1998; Wiliam, 2010), the context in which language 
learning takes place is an important aspect in the use of descriptors-based 
proficiency scales. Teachers need to be able to create communicative contexts 
by designing tasks that elicit the specific observable behaviours from which 
meaningful inferences about students' language ability can be made. Teach¬ 
ers therefore require some degree of assessment literacy to support their ef¬ 
fective use of descriptor scales. 

Teachers' language assessment competence refers to the knowledge, skills, 
and abilities that teachers need to implement language assessment activi¬ 
ties and interpret students' language skills and development (Fulcher, 2012). 
These competences are important, as teachers' interpretations and decisions 
about students' language abilities may have impact beyond the confines of 
the classroom (Edelenbos & Kubanek-German, 2004; Jang et al., 2015). For 
example, policy-related decisions such as the necessity to offer accommoda¬ 
tions during standardized testing, or even whether it is appropriate for some 
language learners to write standardized assessments (particularly during the 
initial phase of their language acquisition), may be determined by teachers' 
interpretations of students' language abilities. Other language-related deci¬ 
sions such as provision of supports or advancement may also be influenced or 
determined by teachers' evaluation. While CBLA promotes teachers' agency 
in assessment in general, it recognizes a great need to develop teachers' as¬ 
sessment competencies. These concerns point to the role that descriptors- 
based proficiency scales might play in supporting teachers' development of 
assessment competencies, potentially increasing teachers' knowledge of and 
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communication about language proficiency development (Cummins et al., 
2009; Jones & Saville, 2005), and assisting with teachers' professional judge¬ 
ments about learners' language development needs (Cumming, 2009; Davi¬ 
son & Leung, 2009). 

Research Methods 

Our research team was commissioned by the Ontario Ministry of Educa¬ 
tion to conduct a validation study of the STEP proficiency scales, beginning 
in September 2008 and continuing to September 2011. The objectives of the 
study were to (a) empirically verify the STEP scales by collecting samples 
of student performance that exemplify the levels of proficiency on the STEP 
scales, (b) examine the linguistic and cultural sensitivity of the STEP descrip¬ 
tors, and (c) evaluate the impact of the STEP assessment on teaching and 
learning. 

We gathered multiple types of data that documented teachers' use of the 
STEP scales from 42 ESL and classroom teachers and 159 students across 
three school districts. 2 Table 2 displays the distribution of the teachers across 
the different grade clusters. Most of these teachers (87%) had 6 or more years 
of teaching experience. The participating teachers were uniformly distributed 
among subject specialists, mainstream teachers, and ESL teachers. Teachers 
were required to observe and sample students' language performances dur¬ 
ing student learning activities, and interpret and evaluate these performances 
to make decisions about students' language proficiency along the STEP con¬ 
tinuum in each of the three language modalities. Specifically, teachers tracked 
students' mastery of described observable linguistic behaviours as described 
in the STEP continua for the Oral Communication, Reading and Responding, 
and Writing modalities. This decision-making process was based on the evi¬ 
dence that teachers gathered about students' language learning over several 
weeks of instructional time, which were based on teachers' informal obser¬ 
vations, discussions with students, and planned assessment activities (e.g.. 


Table 2 

Distribution of Participating Teachers Across Grade Clusters 


Grade cluster 

Number of teachers 

Primary 

19 

Junior 

7 

Intermediate 

7 

Secondary 

8 

Grade not specified 

1 

Total 

42 
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writing assignments, presentations). This article focuses on the impact of the 
STEP scales on teaching practice by drawing on the various types of data 
gathered during this period of teachers' evidence-gathering and decision¬ 
making about students' linguistic behaviours. The data sources included 
classroom observations, teacher and student interviews, samples of students' 
written work, and digital video documentation of teachers' assessment activi¬ 
ties using STEP. Herein, we describe the process of collecting these data and 
their analyses in more detail. 

During our classroom visits, teachers were encouraged to showcase 
the specific needs and interests of their students as well as their classroom 
practices. Classroom observations were conducted using a preconstructed 
protocol in order to observe teachers' classroom and feedback practices. In 
particular, we were interested in observing the facilitation of classroom-based 
tasks and activities and the extent to which (a) teachers scaffolded learning, 
(b) opportunities existed for teacher-student interactions and provision of 
feedback to learners, (c) language and literacy behaviours could be observed 
across different modalities, and (d) teachers employed strategies to assist and 
reinforce language learning. These activities were documented using field 
notes, as well as audio and video recordings. All of the observational data 
were transcribed and coded (using NVivo version 9) according to these four 
categories. We further coded the data thematically using both inductive (to 
seek new themes) and deductive (to identify more specific features of the 
aforementioned categories) approaches. 

Teachers were interviewed in order to gather more information about 
their classroom practices, their processes of gathering evidence about stu¬ 
dents' language learning and development, and their post-STEP reflections. 
We aimed to understand how STEP influenced these educators and asked 
specific questions about their teaching practices, understanding of ELL's lan¬ 
guage development, assessment practices, and challenges they may have en¬ 
countered during the process of its use. These data were similarly transcribed 
and thematically analyzed. 

Students' work samples were gathered and used to further understand 
how teachers interpreted the evidence of students' performance, and the in¬ 
dicators they used to make decisions about students' mastery of STEP ob¬ 
servable linguistic behaviours. The samples were also used to understand 
teachers' feedback practices and the ways in which they identified gaps in 
students' learning and guided learners towards addressing them. These 
samples included work across each of the three STEP modalities, and were 
matched to students' STEP profiles and background information about the 
students. 

The multiple data from teachers and classrooms across different grade 
levels provided the opportunity to elaborate and complement the results of 
one set of data analyses with those of another. Consequently, the process 
of data analysis was recurrent and iterative; we revisited codes and themes 
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across all of the data sources to revise, reanalyze, and further augment our 
understanding and interpretation of the results. Accordingly, we were able to 
synthesize our findings across the different data sources and types 

Results 

The analyses of the multiple types of data allowed us to generate findings 
about the use of STEP and its impact on teaching activities, including teach¬ 
ers' understanding of students' language development, teachers' understand¬ 
ing of the role of instruction and feedback in language learning, and teachers' 
use of the scales to support diagnostic and formative purposes of assessment. 
These data also highlighted challenges in classroom-based language assess¬ 
ment relating to the interdependence of language and literacy development 
in the context of school-based learning. We now report these findings in de¬ 
tail, drawing from a synthesis of the results from the different data sources 
and analyses. 

Facilitating Formative Language Assessment 

One of the major findings emerging from the data analyses was that the STEP 
scales operated as an overarching language assessment framework that sup¬ 
ported curricular planning, instructional activities, and assessment of and 
feedback to students. The findings suggested that teachers' use of the STEP 
scales facilitated ongoing, informal assessments of students' language prog¬ 
ress. Teachers used their judgements about students' progress to inform their 
teaching and assessment strategies, and to communicate with students about 
their language development. These formative and summative uses of STEP 
may have contributed to an assessment for learning culture within the class¬ 
room, helping teachers to better understand how to improve students' lan¬ 
guage learning and give learners constructive feedback (see, e.g., Biggs, 1998; 
Carless, 2007; Davison, 2007; Hamp-Lyons, 2007; Harlen, 2005; Taras, 2005). 

A major theme that emerged from the analyses of the data across the mul¬ 
tiple sources was that teachers perceived that the proficiency scales could be 
used for diagnostic and formative purposes. Using their own observational 
evidence as well as students' performance on learning tasks that were already 
a part of their instructional repertoire, experienced teachers who participated 
in the study reported that they were able to elicit sufficient evidence of stu¬ 
dents' linguistic performances to make judgements about students' profi¬ 
ciency level. Participating teachers also described their ability to use the STEP 
scales to describe learners' language development based on learners' linguis¬ 
tic performances that they observed in their classrooms. Therefore, it can 
be surmised that the descriptor scales helped teachers to identify students' 
initial language proficiency level, and each level provided teachers with some 
indication of the gradually expanding scope of the learners' strategic range 
and underlying linguistic competence. As teachers explained, this informa- 
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tion helped them to recognize and differentiate learners' individual language 
learning needs. 

Teachers reported that their use of the STEP assessment tool involved 
gathering evidence of students' linguistic performances within the context 
of everyday curriculum learning activities. This evidence included anecdotal 
notes from observations of students' participation in class, as well as other 
classroom-based evidence such as student portfolios, samples of students' 
work, and teacher-developed tests. Reflecting upon this evidence of students' 
linguistic performances, participating teachers described that they could use 
the proficiency scales to help them select future learning goals for students 
and plan their instruction accordingly. Participating teachers described their 
interest in using the STEP scales to help them know each learner and better 
understand learners' language proficiency development. The following quo¬ 
tation from one teacher illustrates this point: 

When I look at his work, at his work folder, I identified him as being 
STEP 2, and [the descriptors] are very specific in terms of what he 
should have accomplished at STEP 2 ... There is a checklist and 
when I flip over to STEP 3, it's nice to see the things that he needs 
to be working on, it's a specific set of things that we need to do. 
(Elementary ESL teacher) 

The results also supported the finding that the STEP scales drew teachers' at¬ 
tention to the wide range of learning activities implicitly embedded in the de¬ 
scriptors, which supported STEP'S formative purposes to inform instructional 
planning and practice. The alignment of the proficiency scales with curricu¬ 
lum, the comprehensive nature of the scales, and the incremental progression 
of language development in the STEP scales contributed to teachers' ability 
to integrate STEP with their instructional practice. Describing what learners 
can do at each level, the STEP scales articulated the linguistic performances 
students should be able to do to demonstrate progress. Teachers reported 
using this information to identify teaching strategies to help learners prog¬ 
ress, including differentiating instruction, planning instructional strategies, 
and designing future curriculum-related and classroom-based assessment 
tasks. Another quotation from a teacher exemplifies this finding: 

STEP helped me choose a more appropriate form of support for the 
students and the classroom teacher. I found the descriptors helpful to 
give me direction in the sort of strategies I might use to help the chil¬ 
dren progress. (Elementary ESL teacher) 

Participating teachers also articulated the ways in which they used the 
STEP scales to guide their planning and design of assessment activities. Evi¬ 
dence gathered during the teacher interviews highlighted how teachers chose 
assessment tasks that would enable them to observe specific descriptors. 
Using STEP, teachers reflected upon their instructional practice and whether 
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and how it provides sufficient opportunities for both language learning and 
assessment of learners' language proficiency development. The use of the 
STEP framework also prompted teachers to identify opportunities for assess¬ 
ment embedded within curricular activities, and to assess the degree to which 
these activities incorporated domains of language development. For instance, 
teachers reported the need to gather multiple sources of evidence as the basis 
for their language assessment activities. Gathering a diverse range of student 
work samples reflects the idea that an individual learner may exhibit a range 
of proficiencies in language use depending on the context or learning task. 
Similarly, the use of a variety of classroom activities for language assessment 
tasks provided learners with several opportunities to display their linguistic 
and pragmatic competencies, and provided teachers with both holistic and 
nuanced understandings of the individual learner and his or her language 
development. The following quotation from one teacher illustrates these con¬ 
cerns: 

This is only one sample, one piece of evidence, so when you're 
looking at STEP, we need to have multiple pieces of evidence of 
something before we can see that they've really done it, sometimes 
kids miss something, like I think she missed it [in this instance], so 
I would revisit it with her, talk about it with her a bit more but not 
focus on this evidence too much, but focus on all the other pieces 
of evidence that I have from her, because I don't think that this is 
a fair piece to show her reading comprehension, to say OK, this is 
your summative and I'm using this and only this. (Intermediate ESL 
teacher) 

The data revealed that teachers used scaffolding and various accommoda¬ 
tions to differentiate their STEP assessment activities and support individual 
learners' participation and learning in the context of the assessment process. 
Among the student work that we gathered, many samples included evidence 
of teachers' efforts to scaffold students' production of culminating curricular 
activities. For example, we gathered evidence of brainstorming activities, jot- 
notes, drafts of written work, and final copies of written work. These kinds of 
scaffolding strategies built students' background knowledge, key concepts, 
and vocabulary. For assessment of reading using STEP, many teachers pro¬ 
vided samples of graphic organizers they used to provide scaffolding to im¬ 
prove learners' comprehension of a selected text. Similarly, evidence from 
video recordings of teachers' assessment activities demonstrated the use of 
strategies such as explicit modelling by the teacher, opportunities for prac¬ 
tice with feedback, and skillful adjustments to accommodate the learner's 
oral proficiency level. These strategies contributed to students working in the 
metaphorical zone of proximal development (Vygosky, 1978) during assess¬ 
ment and helped the learner to perform at a slightly higher level than they 
could have done independently. 
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Participating teachers described the ways in which they use assessment 
information to communicate with students about their language learning. 
Teachers used the descriptors in the STEP scales to explain to learners what 
they had achieved and what they needed to focus on next. Using the descrip¬ 
tors to formulate specific feedback to students, teachers assisted students in 
reviewing and reflecting on their language proficiency development. Teach¬ 
ers described their perception that this information would be useful to help 
learners monitor their own language learning progress. This is illustrated by 
the following quotation from one teacher: 

It [STEP] is really good even for my students who aren't English lan¬ 
guage learners because it's a reminder to have checkpoints, to have 
goals for the kids so they can see "Oh, I really am learning this," 
it helps them see where their learning needs to go. (Intermediate 
teacher) 

However, from observation and interview data, we found that teachers' 
feedback to students during the STEP implementation was minimal, though 
feedback is generally seen as a key function of formative assessment, with 
the goal of improving student learning (Davison & Leung, 2009). This gap 
points to teachers' professional learning needs concerning how to use STEP as 
a checkpoint, in addition to their formative feedback, to support instructional 
and assessment practices in the classroom. 

Supporting Professional Learning 

A second major theme emerging from the data analyses was related to poten¬ 
tial of STEP as a tool to support professional learning. As teachers reported, 
the use of the STEP scales provided them with a shared language in which 
they could recognize and discuss learners' language use and development 
with their students and other teachers. By providing teachers with a com¬ 
mon language and framework of reference, the scales increased the extent of 
meaningful collaboration and communication among teachers working with 
ELLs, particularly between ESL and mainstream or subject-area teachers. 
Teacher collaboration and sharing of expertise occurred as teachers discussed 
their planning, assessment activities, and decision-making with other teach¬ 
ers, which is illustrated by the following quotation from one ESL teacher: 

The classroom teacher and I have collaborated ... we have compared 
observations, we have shared concerns, and I think that has been a 
great foundation for a good partnership between the two of us. It 
worked out very well. (Elementary ESL teacher) 

Accordingly, it can be suggested that teachers' use of the STEP scales contrib¬ 
uted to their professional learning, building their knowledge based about the 
relationship among language learning, instruction, and assessment. ESL and 
classroom teachers who used STEP reported finding it useful to reflect upon 
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the strategies they used for meeting the needs of ELLs in their classrooms. 
These positive outcomes are described in the following illustrative quota¬ 
tions: 

For my perspective, a classroom teacher, STEP was almost self-as¬ 
sessment [of my teaching. (Elementary teacher) 

STEP opens your eyes to ... where students are going and keeps a 
perspective that ESL students are not necessarily going to progress 
as fast in other areas as the regular class will go, and that is OK ... as 
a classroom teacher it brought to my perspective that it may take a 
year for them to get to the next level, it may take two years to get to 
that level, and that is OK. (Elementary teacher) 

Teachers reported using evidence gathered during the STEP assessment 
to support their decisions relating to students' needs at school. For example, 
teachers reported that STEP assessment activities assisted them in collect¬ 
ing samples of student work and making interpretations of this work which 
could be useful for (a) making decisions about when to provide or stop pro¬ 
viding ESL support; (b) tracking students' language progress, particularly as 
they move from one school to another; and (c) understanding which students 
are ready to write standardized literacy tests such as the Ontario Secondary 
School Literacy Test (OSSLT), and how these students' performance on the 
provincial tests should be interpreted. This quotation was typical of teachers' 
responses: 

We had a meeting to decide whether Luis and another student in the 
ESL program would be participating in [the standardized provincial 
literacy test] and I relied on the STEP assessment tools to make a case 
that he was in fact ready for it, that his English skills have been built 
sufficiently so that he could participate. (Secondary ESL teacher) 

Teachers reported that they experienced difficulty in distinguishing be¬ 
tween reading problems stemming from low levels of linguistic proficiency 
versus more general reading/learning difficulties. With enriched understand¬ 
ing of the stages of language development, teachers explained that they saw 
potential for STEP assessment data to clarify distinctions between language 
acquisition and the learning processes relating to content area curriculum. 
For example, tracking the STEP progress of one student over the 18 months, a 
teacher noted that the student's progress had stalled beyond what she would 
consider normal—he had moved up to the next STEP level on only 1 of ap¬ 
proximately 12 descriptors in his oral development, although his reading de¬ 
velopment had progressed. The teacher described this circumstance: "If you 
see a student that has been here for two years, and they are still at STEP 1, 
then ... [you know] there is something happening here." Teachers explained 
that STEP provides useful evidence of exceptionalities where corroborating 
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evidence exists, explaining, "You cannot go into these meetings with how you 
feel or what you think. You've got to have concrete evidence." 

Understanding Issues Related to Assessing and Tracking Language 
Learning 

A third theme that emerged from the research investigation was teachers' in¬ 
creased awareness of issues related to assessing and tracking students' learn¬ 
ing in the classroom. Participating teachers reported on their awareness of 
situational factors that influence language development and interpretation 
of language development. These factors included students' prior learning 
experiences, family history, socioeconomic background, and cultural differ¬ 
ences. In the context of implementing STEP, teachers wondered whether they 
were assessing speaking, cognition, personality, or cultural knowledge. For 
example, one teacher explained how he takes a student's background into 
consideration when conducting a STEP assessment: 

We would look at that [descriptor] and say, is it unfair to ask that of 
this student? She would be unable to answer and I would have to 
put her on STEP 1.1 did not think the [descriptor] was fair to that 
child. We struggled with that. (Elementary ESL teacher) 

Teachers noted that the cultural fairness of classroom contexts had a notice¬ 
able influence upon the assessment of a learner's language development. For 
example, cultural differences in oral communication styles might influence a 
learner's classroom behaviours and linguistic performances that are used for 
STEP assessment. Furthermore, learners need support to bridge the cultural 
and interactional patterns from those of their home culture to ones that are 
functional for learning in the Canadian context. The following quotation from 
one teacher exemplifies some of these concerns: 

We have to understand the connections [students are] making 
between stepping out of their first language and into the second 
language ... there are all kinds of things that they need to do to get 
to that transition. Depending on the culture, students learn and con¬ 
nect in different ways. So making those connections into English 
depends on how ... they integrate themselves into a different com¬ 
munity apart from their culture, more experiences will happen, they 
will learn from that... and it will be easier for them to move on oral, 
reading, and writing as well. (Elementary ESL teacher) 

Assessing Language Ability in Content Learning 

A final key finding that emerged from the data analyses was related to the 
opportunities afforded by STEP for assessing students' language develop¬ 
ment alongside authentic content-based contexts. By prioritizing the K-12 
learning context, STEP aligns the development of language ability with On- 
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tario curricula and embeds assessment into curriculum learning. While the 
data have revealed that the use of STEP for describing learners' language 
abilities within the context of curriculum learning has contributed to facili¬ 
tating collaboration between ESL and classroom teachers, it was illustrated 
to be challenging for classroom teachers in distinguishing between literacy 
development and language proficiency, partly because the curriculum is not 
sufficiently detailed to the point of defining what language ability means 
(McKay, 2006). Accordingly, teachers felt that the STEP descriptors were too 
tightly aligned with curriculum, and expressed their concern that the STEP 
scales may be limited to assessing literacy development rather than English 
language proficiency. 

Aligning the development of English proficiency with curricular expec¬ 
tations involves articulating levels of language progression based on actual 
learner performance in schools (Byrnes, 2008; North, 2007). The challenges 
we observed are not uncommon, as similar concerns have been raised in dis¬ 
cussions of the CEFR, whose descriptors were developed by expert teachers 
(North, 1993; North & Schneider, 1998). If the construct of "English language 
development" remains indistinguishable from curricular literacy learning, 
embedding language proficiency scales into specific content-learning con¬ 
texts may not be fully achieved. Such challenges emphasize the importance 
of collaborations between ESL and classroom teachers as well as assessment 
specialists and policy makers. 

Discussion 

The findings that emerged from this research illuminate how descriptors- 
based language proficiency scales can contribute to the formative purposes 
of classroom-based language assessment in the K-12 educational context. The 
use of the STEP scales promoted an assessment for learning culture among 
participating teachers (Black & Wiliam, 1998, 2008; Cumming, 2009; Davison 
& Leung, 2009; Gardner, Wynne, Hayward, & Stobart, 2008), increasing their 
knowledge about language proficiency development, assisting with their pro¬ 
fessional judgements (Cumming, 2009; Davison & Leung, 2009), increasing 
communication, and promoting teachers' collaboration with peers. We have 
reported elsewhere on teachers' critiques of the STEP framework (Jang et al., 
2011), which include issues relating to teachers' time, professional learning 
needs, and availability of resources. Despite these issues, teachers' percep¬ 
tions about the use and impact of STEP suggest that descriptor scales have 
the potential to be used alongside existing classroom assessment activities to 
assist with teachers' understandings about learners' capabilities and needs, 
and to develop teachers' knowledge, skills, and abilities in classroom-based 
language assessment. In Ontario, these developments are critical because 
curriculum policy articulates that all teachers, not just those designated as 
ESL teachers, are required to identify the needs and instructional strategies 
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necessary to support the language and learning of ELLs in their classrooms, 
and all teachers are required to record students' language progression in of¬ 
ficial school records. Teachers are often required to meet these objectives with 
sometimes little background knowledge of issues in language learning and 
assessment. Thus, district-level implementation and teachers' use of the STEP 
scales can potentially assist in building province-wide capacity to meet these 
policy requirements. 

Relating to these policy requirements, we can identify several challenges 
concerning how to assist teachers in integrating the use of language pro¬ 
ficiency scales into their instructional practice. For example, teachers need 
to understand and differentiate among students' linguistic and pragmatic 
competencies, and their overall cognitive and social development and con¬ 
tent area knowledge. Connecting the proficiency scales to language use 
using evidence from everyday curriculum learning activities contributes to 
an appropriate and representative form of observation of students' linguistic 
performance and process of language acquisition. However, reflecting the in¬ 
terdependence of language, literacy, and content curriculum learning, evalu¬ 
ating students' language use and performances in the classroom means that 
assessing language includes assessing content to some degree. Further, teach¬ 
ers' understandings of these aspects of language proficiency development in 
the learning context relate to their ability to select and use appropriate tasks 
for assessing language proficiency development. 

Initially, the articulated purpose of the STEP proficiency scales was to 
identify, monitor, and track the progress of English language learners in On¬ 
tario schools. Over time, the Ministry recognized that teachers' use of STEP 
could serve multiple purposes, including directing teachers' instructional 
goals and activities, guiding formative purposes of language assessment, 
supporting teachers' professional learning, and building system-wide ca¬ 
pacity for supporting ELLs. To meet these multiple purposes of STEP in a 
manner consistent with the scholarly literature and evidence-based practice 
in classroom-based language assessment, we propose a framework for the 
development of teacher language assessment competence that highlights the 
kinds of knowledge and skills that teachers need to use descriptors-based 
language proficiency scales effectively in their classrooms. This framework 
is outlined below: 

• Assessment tasks. Teachers' assessment activities should be sufficiently 
rich to support the learners in showcasing the full range of their commu¬ 
nicative and linguistic abilities at, or slightly above, their level of compe¬ 
tence. 

• Instructional strategies to scaffold assessment. Assessment activities 
should be scaffolded with appropriate instructional strategies to assist 
the learners in accomplishing the task at, or slightly above, their level 
of competence; teachers' evaluation of this performance can account for 
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this scaffolding, based on the assumption that students progress from 
accomplishing linguistic performances with assistance at the lower levels 
of proficiency and move to independent accomplishments at the higher 
levels of proficiency. 

• Observational abilities. Teachers need the ability to notice learners' lin¬ 
guistic and communicative performances during everyday classroom 
interactions, and need to gather observational data about these perfor¬ 
mances; this includes recording observational notes about the learner and 
gathering samples of his or her work to reflect on, in a manner that is 
sufficient to assist the teacher in making an appropriate judgement about 
the learner's language performance. 

• Interpretive abilities. Teachers need the ability to interpret students' 
linguistic and communicative performances on everyday teaching and 
learning activities; they need the ability to distinguish among a learner's 
language development, his or her sociocultural competence, and his or 
her curriculum learning. 

• Use. Teachers need the ability to use the proficiency scales to inform their 
instructional practice; for instance, using descriptors to guide instruction, 
to define what students need to know, to periodically gauge the learner's 
understanding, and to give the learner descriptive feedback to help him 
or her reach those goals. 

Conclusion 

Educators play a critical role in supporting the academic success of English 
language learners in Ontario schools, and these students have unique lan¬ 
guage and learning needs. The research reported here suggests that effective 
use of descriptors-based language proficiency scales for classroom-based lan¬ 
guage assessment can potentially promote professional learning for educators 
and build capacity for schools to meet the needs of ELLs. This research con¬ 
tributes to the articulation of classroom-based language assessment practices, 
drawing on the perspective of teachers to provide a nuanced perspective of 
relationship between language, language learning, and language assessment 
in the K-12 educational context. Documenting teachers' perceptions about the 
use and impact of descriptors-based language proficiency scales contributes 
evidence that can be drawn upon to theorize and operationalize the con¬ 
structs underlying the processes of classroom-based language learning, and 
the assessment practices that measure these. 
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Notes 

1 English for Literacy Development (ELD) is a term used in Ontario education to refer to 
English language learners with limited prior schooling, circumstances that have led to gaps in 
these students' academic literacy and numeracy. The term ELD is also used in Ontario education 
to refer to programs specifically designed to address these unique learning needs, including not 
only English language teaching, but also development of numeracy concepts and literacy skills. 

2 The study focused on the STEP scales for ESL only, not ELD, because the ELD descriptors 
were still in the process of development at the time the study began. 
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